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ABSTRACT 


In  this  paper  we  review  a  class  of  filters  for  which  the  two  stages  just  described 
are  performed  simultaneously  and  iteratively.  The  filter  coefficients  are  changed 
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OPTIMAL  AOAPTIVE  NON  RECURSIVE  FILTERS 


I.  INTRODUCTION: 

Filters  are  applied  to  time  sequences  to  extract  desired 
signals  from  random  noise  and  from  interfering  signals.  The 
specific  form  of  the  filter  Is  related  to  aprtori  knowledge  of 
the  statistics  of  the  desired  and  the  undeslred  signals.  Classic 
filter  design  entails  two  distinct  stages.  The  first  is  to 
employ  some  monitoring  and  smoothing  scheme  which  wil)  lead  to 
estimates  of  the  signal  statistics  such  as  the  signal  and  noise 
covariance  functions  or  power  spectrums.  The  second  stage  is  to 
formulate  the  filter  response  in  terms  of  these  estimated 
statistics. 

In  this  paper  we  review  a  class  of  filters  for  which  the 
two  stages  just  described  are  performed  simultaneously  and 
iteratively.  The  filter  coefficients  are  changed  by  a  recursive 
algorithm  which  corrects  the  filter  response  during  the  process¬ 
ing  of  the  input  data.  The  capability  to  modify  the  filter 
response  during  operation  makes  it  possible  to  track  and  to  filter 
signals  with  slowly  changing  statistics. 
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II.  BACKGROUND: 

A  discrete  non  recursive  filter  Is  one  for  which  the 
present  output  y(k)  Is  obtained  as  a  weighted  summation  of  the 
past,  present,  and  future  inputs  x(k+j)  with  -N  <_  J  <_ +N.  The 
output  of  such  a  filter  Is  simply  the  finite  convolution  sum 
shown  In  Eq  ( 1 )  . 

N 

y(k)  -  22  wj  O) 

J  — N 

For  realizability,  the  Index  of  summation  Is  restricted  to  be 
non  negative  as  shown  In  Eq  (2). 

N 

y(k)  “  22  wj  x(k'J)  (2) 

J-0 

A  block  diagram  of  such  a  filter  is  shown  in  figure  1. 
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FIGURE  1.  A  REALIZABLE  NON  RECURSIVE  FILTER 


A  time  varying  non  recursive  filter  is  one  for  which  the  weighting 
coefficients  Wj  are  allowed  to  change  In  some  prescribed  manner 
prior  to  each  filter  computation.  Thus  the  Wj's  are  functions  of 
the  time  Indexes  well  as  the  position  tndex  and  we  will  use  the 
notation  Wj (k)  .  This  is  shown  in  Eq  (3). 

N 

y(k)  x(k-J)  (3) 

J-0 


For  simplicity  of  notation,  let  us  define  W(k)  as  the  N+1 
dimensional  column  vector  of  weights  at  time  k,  end  T(k)  as  the 
N+1  dimensional  column  vector  of  data  at  time  k.  These  vectors 
are  shown  in  Eq  (4) 


W(k) 


w0(k) 

x(k) 

w,  (k) 

x(k-l) 

*2  (k) 

• 

• 

X<k)  - 

W** 

CN 

1 

•  • 

X 

• 

wN  (k) 

• 

x(k-N) 

’  4 

(4) 


Then  Eq  (3)  defining  y(k)  can  be  written  as  a  simple  inner- 
product  as  indicated  in  Eq  (S) *  where  of  course,  the  superscript 

y (k)  -  WT(k)  X(k)  -  XT(k)  W(k)  (5) 

"T"  Indicates  the  transpose  of  the  vector. 

An  adaptive  non  recursive  filter  is  one  for  which  the 
weight  vector  V?(k)  Is  computed  by  an  algorithm  to  reduce  a  cost 
function.  This  cost  function  compares  the  output  of  the  filter 
y(k)  to  an  auxiallary  Input  v(k)  and  is  classically  selected 
to  be  a  quadratic  function  of  the  difference. 
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It  is  precisely  this  cost  function  and  the  corresponding 
algorithms  that  this  paper  addresses. 

Ill  THE  NORMAL  EQUATION. 

The  model  of  the  fitter  we  are  examining  ts  shown  in 
Figure  2. 


FIGURE  2:  ADAPTIVE  NON  RECURSIVE  FILTER 


ADJUSTMENT 


From  Figure  2.  we  define  the  error  at  time  k,e(k)  by  Eq(6). 


e(k)  -  v (k)  -  y (k)  (6a) 

-  v ( k )  -  WT(k)  X(k)  (6b) 

The  square  of  this  error  e  (k)  is  shown  in  Eq(7). 

e2(k)  -  [v(k)  -  WT(k)  X(k)]2  (7a) 

-  v2 (k)  -  2  v (k)  XT(k)wTk) 

+  WT(k)X(k)XT(k)W(k)  (7h) 


The  expected  value  of  this  squared  error  (or  mean  square  error)  Is 
shown  In  Eq  (8) . 


Ef  e2  (k)  }  -*vv(k>  '  2^Tx(k)W{k)  +  yT  (k)R-- (k>W(k)  (8) 

Where  f^x(k)lsthe  correlation  vector  between  the  desired  signal 
v(k)  and  the  data  vector  X*(k)  as  shown  in  E q  ( 9 )  • 


and  where  R(k)  Is  the  covariance  matrix  of  the  Input  data  vector 
as  shown  in  Eq  ( 10)  , 

R—(k)  -  E{X(k)XT(k)}  (10a) 


and  where  Ayy(k)  Is  the  covariance  of  the  desired  signal  v(k). 

We  wish  to  minimize  this  mean  squared  error  with  respect  to  the 
the  weight  vector  7(k).  We  accomplish  this  by  computing  the  first 
variation  (or  the  gradient)  of  the  squared  error  with  respect  to 
the  weight  vector  W(k)  and  then  setting  this  first  variation  to 
zero.  This  is  shown  In  Eq  (II). 

6E{e2(k)}  -  -2Ryx(k)6W(k)  +  6WT (k) R--  W(k) 

♦  WT(k)R_.  6W(k)  (lie) 

-  -2  WT(k)(Ryx(k)  -  KiSW(k)l  (lib) 

The  first  variation  is  zero  if 

Ryx(k)  -  R-j.W(k)  -  0  (lie) 

We  note  that  Eq(llc)  is  called  the  Normal  Equation,  and  the 
solution  W(k)  which  minimizes  the  mean  squared  error  can  be 
obtained  by  inverting  the  data  covariance  matrix  R--.  The 
resultant  solution  shown  in  Eq(!2)  is  the  Weiner-Hopf  solution 
in  matrix  form. 

w(k>  •  *s;*vx(k>  ,,2> 

Equation  (lie)  can  also  be  solved  iteratively  by  gradient 
descent  techniques  which  we  will  now  examine. 

IV.  GRADIENT  DESCENT  TO  NORMAL  SOLUTION: 

We  define  at  time  k,  the  I-th  iterative  approximat ion  to 
W(k)  by  Wj (k) .  We  also  define  at  time  k,  the  l-th  residual  7j  (k) 
obtained  by  trying  W{ (k)  in  the  normal  equation.  This  is  shown 
in  Eq  (13). 

7,(k)  "  *v*<k>  ‘  RRR'7l{k)  (,3) 
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We  simply  seek  a  technique  which  will, at  each  Iteration, 
reduce  the  residual  7 j  (k) .  We  note  that  the  residual  7j  (k)  Is 
half  the  gradient  vector  of  the  mean  squared  error  evaluated  at 
Wj  (k) ,  see  Eq(llb).  Of  course  reducing  the  residua!  to  zero  Is 
same  as  reducing  the  gradient  to  zero  which  is  the  first 
necessary  condition  for  an  unconstrained  local  extrema.  Since 
the  mean  squared  error  is  simply  quadratic  in  the  weight  vector, 
there  is  only  one  extrema  which  means  that  the  local  extrema 
corresponds  to  the  optimal  weight  vector. 

The  iterative  correction  technique  is  simply  to  change 
the  weight  vector  Wj(k)  in  the  direction  of  the  gradient.  The 
gradient  point  down  hill  towards  the  local  extrema.  The  change 
in  the  weight  vector  is  shown  In  Eq  (lit). 


^,+1(k)  -  Wj (k)  +  Oj(k)  7j(k) 


(H) 


The  scalar  a.  controls  the  rate  of  convergence  of  the  algorithm 
by  establishing  the  size  of  the  step  in  the  direction  of  the 
gradient.  It  is  also  simple  to  bound  the  step  size  to  assure 
convergence.  This  will  be  done  later.  For  now  we  will  derive  the 
value  of  a  which  maximizes  the  rate  of  convergence.  When  this 
optimal  value  of  a  is  used  in  the  gradient  descent  algorithm, 
the  algorithm  1$  called  steepest  descent.  We  deteimine  the 
optimal  a  by  substituting  Eq  (14)  into  Eq  (8).  We  obtain  Eq  (15)- 

E{e2(k)}  -  R yv (k)  -  2R^x(k) [Wj (k)+a{ (k)7,  (k)] 

+  IW,  (k)+o,  (k)7,  (k)]TRxx(W}  (k)+a,  (k)7,  (k)  ] 

(15) 

We  now  take  the  first  variation  with  respect  to  Oj (k)  and 
obtain  Eq  (16). 
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«E{e‘(k)>  -  -26a,(k)[R;x(k)T|(k)] 


♦fiaj  (k)7*  (k)RRRtf,  (k) 

+Wf  (kjR.-fiajtkJr^k) 
♦26a,(k)a,(k)7,(k)RRR7,  (k) 

-  -26a,  (k)  lRyXU)7,  (k)-7]  (k)RRRW,  (k) 

-a, (k)7, (k)RRR7, (k)  ] 

-  -26a, (k){7j(k)[Rvx(k)-R..W,(k)] 

-a,(k)l7{(k)R--7,(k)n 
•  -26a, (k) {7|(k)7, (k)-a, (k)7j (k) R~7, (k) } 

The  first  variation  is  zero  if  Eq  (17)  holds. 

7j(k)7,  (k)  -  a, (k)  frj(k)R--7,  (k) 1  »  0 


or;  a , (k) 


7j (k)7,  (k) 

7T (k) R--7,  (k) 

i  '  xx  i'  ' 


The  total  algorithm  for  the  method  of  steepest  descent  Is 
presented  in  Eq  (18). 


7.  (k)  -  R  (k)  -  R-.W,  (k) 
I  vx  xS  i 


a,(k)  - 


7j(k)7,  (k) 

"r J (k ) RRRr ,  (k) 


(16a) 

(16b) 

(16c) 
( 1 6  d ) 

(17a) 

(17b) 

(18a) 

(18b) 


Sf,i.,(k)  »  ¥,{k)  ♦  a.(k)7.(k) 


(18c) 


Note  that  the  index  of  iteration  of  the  steepest  descent  is  "I", 
end  that  the  index  "k"  is  the  one  which  evolves  with  time  to 
allow  for  time  varying  statistics.  Alto  note  that  If  the  constant 
<S|(k)  is  selected  to  be  a  constant  a(k),  the  method  ts  simply 
a  gradient  descent  (as  opposed  to  a  steepest  descent). 

The  gradient  algorithm  can  be  described  by  a  signal  flow 
graph.  The  graph  presents  the  inter-relations  of  the  algorithm 
in  a  clear  concise  manner  and  suggests  the  basic  techniques  for 
analyzing  the  algorithm.  Figure  3  Is  the  signal  flow  graph  of 
the  gradient  algorithm. 


GAIN 

COMPUTATION 


FIGURE  3.  SIGNAL  FLOW  GRAPH  OF  GRADIENT  DESCENT  ALGORITHM 

The  gain  computation  box  in  Figure  3  is  required  for  the 
steepest  descent  algorithm  but  is  omitted  for  the  simple 
gradient  descent.  Note  that  If  the  gain  changes  at  each  Iteration 
the  algorithm  Is  time  varying  and  becomes  more  difficult  to 
analyze. 


A  practical  problem  with  the  steepest  descent  algorithm  is  that 
sample  statistics  must  be  formed  to  estimate  fTyx  and  Rjj.  The 
computat i ona 1  burden  to  obtain  these  estimates  may  be  significant, 
and  If  the  statistics  are  slowly  varying  they  have  to  be 
periodically  recomputed.  Also,  the  sample  statistics  are  only 
estimates  and  as  such  have  a  non  zero  variance. 

We  now  address  an  extension  of  the  gradient  descent  algorithm 
known  as  the  Least  Mean  Square  (LMS)  adaptive  algorithm.  This 
algorithm  constructs  an  estimate  to  the  solution  of  the  Normal 
Equation  without  constructing  the  sample  covariance  matrix. 

V.  LEAST  MEAN  SQUARE  ALGORITHM: 

The  gradient  descent  technique  described  in  the  previous 
section  requires  the  computation  of  the  gradient  of  the  mean 
square  error  function  at  each  iteration.  The  LMS  algorithm  uses 
a  noisy  estimate  to  the  gradient  for  Its  descent.  The  estimate 
is  obtained  by  computing  the  gradient  of  the  instantaneous  error 
(rather  than  the  gradient  of  the  mean  squared  error).  The  squared 
error  is  defined  in  Eq  (7b)  and  the  first  variation  with  respect 
to  the  weight  vector  W(k)  is  shown  in  Eq  (19). 

6e2  (k)  -  -26WT(k)X(k)v(k) 

+  6WT  (k)X(k)XT(k)W(k) 

+  VT(k)X(k)XT(k)6W(k)  (19a) 

-  -2fiWT(k) [X(k)v(k)  -  X(k)XT(k)V(k) ]  (19b) 

-  -6WT(k)'2X(k)[v(k)  -  XT(k)W(k))  (19c) 

The  gradient  of  the  instantaneous  error  is  shown  in  Eq  (20). 

2 

.  V  (e2(k)]  -  -2X(k)[v(k)  -  *T(k)?(k)J 
«WT(k)  W 


(20) 


The  estimated  gradient  Is  unbiased  as  demonstrated  by  forming 
its  expected  value.  This  Is  shown  In  Eq  (21)  which  Incorporates 
the  definitions  presented  tn  Eqs  (9)  and  (10). 


£{V(e2 (k)  ] }  -  -2£{X(k)  Iv(k)  -  XT(k)V(k)]} 


-  -2(7?vx(k)  -  RRJ.W(k)] 


(21a) 

(21b) 


Comparing  equations  (13)  and  (21b),  we  see 
E{V[e2(k)]}  -  7E{e2 (k) } 

The  steepest  descent  LHS  algorithm  is  essentially  the  same 

the  steepest  descent  gradient  algorithm  shown  in  Eq  (18). 

-  .  -  .  ...  1  .  ttT  / ,  \  Tr  / .  \  . 


(21c) 


covariance  matrix 

E{XT(k)X(k)}  we  obtain  the  steepest 

descent 

LMS  algorithm  as  shown  tn  Eq  (22). 

e  (k)  - 

v(k)  -  XT(k)w(k) 

(22a) 

”(k)  « 

X(k)e(k) 

(22b) 

ct(k)  - 

1 

T 

(22c) 

xT(k)x(k) 

W(k+1)  - 

W(k)  +  a(k)7(k) 

(22d) 

Or  more  compactly; 

W(k+1)  - 

W(k)  ♦  o(k)X(k) [v(k)  -  XT(k)W(k)] 

(22e) 

Note  that  equation  (22)  iterates  on  the  time  Index  "k"  while 
equation  (18)  Iterates  on  an  auxialiary  Index  ,,i".  The  IMS 
algorithm  presented  by  Vldrow  employs  a  constant  value  of  a 
and  as  such  Is  a  simple  gradient  descent  as  opposed  to  a  steepest 
descent  algorithm.  The  LHS  steepest  gradient  algorithm  can  also 
be  represented  by  a  signal  flow  graph.  This  Is  presented  In 
Figure  4. 


11- 


FIGURE  4a. 


FIGURE  4b. 

FIGURE  4a.  SIGNAL  FLOW  GRAPH  OF  LMS  DESCENT  ALGORITHM 
FIGURE  4b.  FACTORED  FORM  OF  SIGNAL  FLOW  GRAPH 
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Comparing  Figures  k»  and  3  we  note  that  the  difference  between 
the  algorithms  lies  in  the  use  of  approximations  to  construct 
the  gradient  (or  residual).  Since  the  approximations  In  Figure  4 
are  noisy  estimates  of  the  parameters  of  Figure  3,  it  stands  to 
reason  that  the  resultant  gradients  are  also  noisy  estimates. 
Thus  the  LMS  algorithm  differs  from  the  gradient  algorithms  by 
the  presence  of  gradient  noise  within  the  outer  feedbafck  loop, 
see  Figure  5. 


FIGURE  5.  MODEL  OF  LMS  ALGORITHM: 

GRADIENT  ALGORITHM  WITH  ADDITIONAL  GRADIENT  NOISE 

The  effect  of  the  gradient  noise  on  the  steady  state  solution 
for  the  weight  vector  is  the  addition  of  an  error  component  to 
the  weight  vector  called  mi sadj us tment .  Since  the  gradient  noise 
Is  within  the  feedback  loop,  its  effect  on  steady  state  response 
(ml sadj ustment  of  weight  vector  or  excess  mean  square  error)  will 
be  scaled  by  the  reciprocal  o *  the  loop  gain.  Hence  for  small 
steady  state  error,  the  loop  gain  should  be  large.  This  is 
accomplished  by  keeping  the  eigenvalues  of  the  algorithm  close  to 
the  unit  circle.  This  in  turn.  Is  accomplished  with  small  values 
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of  the  fi Iter  gain  a(k).  Small  values  of  a(k)  however.  Increase 
the  transient  time  of  the  al gor 1 1 hm  mak t ng  It  sluggish  end  non- 
responsive  to  changes  In  date  statistics.  We  now  continue  with 
another  formulation  of  the  adaptive  filter  for  which  Internal 
gain  adjustments  allow  for  low  values  of  ml sadj ustment  without 
excessively  long  adaptation  times. 

VI.  OPTIMAL  LINEAR  COMBINER: 

We  now  address  the  following  question;  Given  an  aprlorl 
optimal  estimate  to  the  weight  vector  W(k),  and  the  desired 
output  v(k),  what  is  the  optimum  linear  combination  of  this 
information  to  compute  the  next  estimate  of  the  weight  vector? 

An  equivalent  question  is,  determine  the  form  of  the  linear  gains 
L  and  G  as  shown  In  equation  (23). 

W(k+1)  -  L  W(k)  +  G  v ( k )  (23) 

Where  v(k)  satisfies  Eq  (6),  repeated  here  as  Eq  (24). 

v(k)  -  WT(k)X(k)  +  e  (k)  (2k) 

First  let  us  substitute  Eq  (2k)  Into  Eq  (23)  and  then  examine 
the  expected  value  of  W(k+1). 

E{W (k+1 ) }  -  E{L  W(k)  ♦  G  v(k)>  (25a) 

-  E{ L  W  (k)  +  G  XT(k)W(k)  +  G  e(k)}  (25b) 

-  lL  +  G  XT(k)  ]  E{W(k)  }  +  G  E{e  (k)>  (25c) 

By  assumption;  E{e(k)}  -  0,  and  E{W(k+l)}  ■  E { W ( k ) >  •  W. 

We  then  have; 

L  ♦  £  7T(k)  -  I 

L  -  I  •  nT(k) 


or 


(2ba) 

(26b) 


Subs  1 1 1 ut 1 ng 

Eq  (26)  into  Eq  (23),  we  obtain  Eq  (27)* 

W(k+1 ) 

-  L  W(k)  ♦  Z  v (k) 

(27a) 

-  [1  -  G  XT(k)]W(k)  ♦  G  v (k) 

(27b) 

Or  W(k+l) 

-  W(k)  +  G[v(k)  -  XT(k)W(k)] 

(27c) 

Comparing  Eq  (27c)  with  Eq  (22),  we  find  the  two  forms 
remarkably  similar.  It  now  remains  to  derive  the  gain  term  IT. 

Recalling  the  concept  of  mi sadj us tment ,  we  recognize  that 
any  error  e(k)  computed  during  an  iteration  can  come  from  two 
sources.  The  first  Is  simply  the  error  due  to  ml sadj ustment  of 
the  weights.  In  that  case,  the  weights  should  be  adjusted.  The 
second  source  of  error  is  the  noise  component  in  the  data 
vector  "x(k)  .  This  noise  propogates  to  the  filter  output 
even  if  the  weights  are  set  to  the  optimal  values.  For  that 
source  t:  noise,  we  should  not  adjust  the  weight  vector.  We 
now  consider  the  cost  function  of  Eq  (28)  which  is  structured 
to  reflect  the  two  sources  of  uncertainty  leading  to  the  filter 
error . 

J  -  [v(k)  -  XT(k)W(k+l)]2  + 

(W(k+l)  -  W(k)JT  B~ 1  (k)  (W(k+1)  -  W(k) ]  (28) 

The  matrix  B(k)  is  yet  to  be  defined,  but  wiil  reflect  our 
desire  to  penalize  (or  emphasize)  changes  in  the  weight  vector 
relative  to  the  prediction  error.  B(k)  will  be  termed  the 
information  matrix  as  It  will  reflect  our  apriori  information 
as  to  the  likely  source  of  instantaneous  errors. 

Now  let  us  minimize  the  cost  function  J  with  respect  to 
the  weight  vector  V(k+1).  In  Eq  (29),  we  take  the  first 
variation  of  J  with  respect  to  7(k+t). 


(29a) 


6J  -  -26WT(k+l)X(k) tv(k)  -  XT(k)W(k+l)l  ♦ 

2«WT(k+l)  1  £W<k-*- 1 )  -  ¥(k)] 

-  att^dctOUB^Oi)  +  X(k)XT(k) ]W(k+1) 

-  [B- ^ (k ) W (k)  ♦  X(k)v(k))>  (29b) 

The  first  variation  is  zero  If  Eq  (30)  holds  true. 

[ B" 1  (k)  +  X(k)XT(k)]W(k+l)  -  Cb”1 (k)W(k)  -  x(k)v(k)l  (30a) 
Or  W(k+1)  -  [B_1(k)  +  X(k)XT(k)l_1 [B“* (k)W(k)  -7(k)v(k)J  (30b) 


Now  apply  the  inversion  Lemma  shown  in  Eq  (31) 

-1 

(Aj  +A12A2  A21)  “ 

Ai  '  -  Aj  ^  A 1 2  t A2  +  A 2»Aj  ^  A 1 2 )  'a2xAj  (31) 

with  the  substitutions  indicated  in  Eq  (32),  we  obtain  Eq  (33). 

A,  -  B_,{k)  (32a) 

A 1 2  -  X(k)  (32b) 

A2i  -  XT(k)  (32c) 


A*'1  -  1  (32d) 

ff(k+l)  -  {B (k)  -  B(k)X(k)(l  +  TT(k)B(k)X(k))‘,XT(k)'x(k))- 

lB~ 1 (k)W(k)  +  X(k)v(k))  (33a) 


Expanding  and  gathering  terms,  we  obtain  Eq  (33b). 


W(k+1)  -  W(k)  ♦  B  (k) X (k) v  (k) 


B(k)?(k)XT (k)W(k) 

1  ♦  XT(k)B(k)X(k) 

_  B(k)X(k)XT(k)B(k)X(k)v(k)  (j3b) 

I  ♦  XT(k)B(k)X(k) 

-  W(k)  ♦  - BjkjXOO -  {v(k)  .  xT(k)W(k)]  (33c) 

t  +  X  (k)B(k)X(k) 

Or  W(k+i)  -  W(k)  +  - —iiiii -  X(k)  [v(k)-XT(k)W(k) )  (33d) 

1  +  X  (k)  B  (k)  X  (k) 

Comparing  Eq  (33d)  with  Eq  (27) »  we  have  determined  the  gain  term 
of  Eq  (27)  to  be; 


G  -  -  Bill -  X(k)  (3k) 

\  *  X  (k)6(k)X(k) 

and  comparing  Eq  (33)  with  Eq  (22),  we  recognize  that  the 
convergence  factor  a(k)  which  determines  the  size  of  the  step 
in  the  gradient  direction  has  been  replaced  with  the  scaled 
matrix  indicated  in  Eq  (35)« 

- - - 1 -  B  (k)  (35) 

1  +  X1 (k)B(k)X(k) 

Note  that  the  weight  update  scheme  presented  in  Eq  (33) 
constructs  the  sample  gradient  7(k) [v(k)-XT (k)W(k) 1  as  did  the 
LHS  scheme  but  then  subjects  the  gradient  to  the  matrix 
operator  indicated  in  Eq  (35)<  The  particular  form  that  the 
operator  takes  is  dependent  upon  the  selection  of  the  information 
matrix  B(k)  (or  B’l(k)).  The  operator  allows  for  the  option  of 
rotating  the  gradient  vector.  This  class  of  algorithm  is  called 
a  gradient  deflection  scheme.  The  extent  of  the  deflection  Is 
of  course  controlled  by  the  information  matrix  B(k). 
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VII  SELECTION  OF  THE  INFORMATION  MATRIX  B  ( k) : 


VII. A  Minimum  Variance  Filter. 

The  first  approach  we  examine  wit  I  be  to  determine 

the  B(k)  which  minimizes  the  variance  of  the  estimate  tf(k). 

,  + 

Thus  we  need  to  minimize  Eq  (36)  .(where  W  Is  the  optimal  value 

E{[W(k+l)-W*][W(k+1)-W*]T}  i  S--(k+l)  (36) 

of  the  weight  vector)  subject  to  the  update  algorithm  shown  In 
Eq  (37). 

W(k+1)  -  W(k)  ♦  G(k) (v(k)-XT(k)W(k)]  (37) 

The  definition  of  the  error  term  corresponding  to  the  optimal 
weight  vector  W  Is  shown  In  Eq  (38). 

v  (k)  ■>  x'r(k)W*  +  e(k)  (38) 

Then  the  error  vector  (W(k+I)  -  W*J  is  shown  In  Eq  (39). 

[W(k+l)-W*J  -  (W(k)-W*]  -  6(k)XT(k) [W(k)-W*]  +  G (k) e (k ) 

-  [I  -  G(k)XT(k) ][W(k)-W*]  ♦  G  (k)e (k)  (39b) 

The  variance  can  now  be  found  by  substitution  of  Eq  (39b)  Into 
Eq  (36). 

S--  (k+1 )  -  E{[W(k+1)-W*Hw(k+l)-W*lT}  (kOa) 

ww 

-  E{{[l-G(k)XT(k))lW(k)-W*)+G(k)e(k)}* 

{(l-G(k)XT<k)](W(k)-W*]+G(k)e(k)}T}  (40a) 


(39a) 
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-  E{[l-G(k)XT(k)l[W(lc)-W*nW(k)-W*]T[l-6(k)XT(k)}T} 

+  E{(l-G(k)XT(k)][W(k)-W*]G(k)e(k)> 

♦  E{*(k)G(k){[T-G(k)7T(k)][¥(k)-7*]}T) 

*  E{G(k)GT(k)e2(k)>  (40b) 

Moving  the  expected  value  operator  through  the  right  hand  side 
of  Eq  (40b)  will  lead  to  Eq  (41),  where  E{e2(k)}  ■  R#e(li). 

S  (k+l)  -  [l-G(k)XT(k)]$--(k)ll-G(k)xT(k)] 
ww  ww 

+  G (k) GT (k) R  (k)  (41) 

*  C 

Expanding  Eq  (41),  we  obtain  the  resits  shown  in  Eq  (42). 
S~(k+1)  "  S--(k)+G(k)XT(k)S-.(k)X(k)GT(k) 

+G(k)GT(k)R  (k) 

-G(k)XT(k)S--(k) 

-S--(k)X(k)GT(k)  (42a) 

-  S-- (k)+G(k) {XT(k)S--(k)X(k)  +  R  (k) ]GT(k) 
ww  ww 

-G(k)XT(k)S--(k)-S-.(k)r(k)GT(k)  (42b) 

For  the  purpose  of  further  manipulation  of  this  expression,  we 
define  the  scalar  shown  In  Eq  (43). 

d (k)  -  lXT(k)S-.(k)X(k)+R(k)J  (43) 

ww  ee 
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Now  rewrite  Eq  (42)  in  the  form  Indicated  as  Eq  (44). 


S--(k+l)  -  G  (k)  d  (k)  G  '  (k)  ♦  S--(k) 


-  G(k)d(k)d“l(k)XT(k)S5-(k) 

-  S--(k)X(k)d"1 (k) d (k) gt (k) 

ww 

♦  S--(k)X(k)d‘,(k)XT(k)S--(k) 

-  S--(k)X(k)d‘,(k)XT(k)S--(k)  (44a) 

ww  ww 

S--(k+l)  -  lG(k)-S--(k)X(k)dM(k)Id(k)  [GT(k)-d’1  (k)XT(k)S--(k)] 


ww 


ww 

+  S..(k)[l-X(k)XT(k)Sij.(k)] 


ww 

(44b) 


Now  define  a  new  cost  function  J(G)  by  Eq  (45) 


J(G)  -  TR [S-- (k+ 1 ) ] 


(45a) 


-  TR {[ G - S - -  Xd~ ' ]d[GT-d' ’xTS--]  +  S-i 1 1 -XXTS-- ] )  (45b) 


ww 


ww 


ww  ww 

We  have  temporarily  suppressed  the  indces  of  Eq  (45)  for  the 
economy  of  notation.  TR(  )  indicates  the  trace  of  the  associated 
matrix.  We  now  form  the  first  variation  of  the  Trace  with 
respect  to  the  gain  matrix  G. 


6J(G)  -  TR{6[G-S--Xd*,]d(GT-d’,XTS--]  + 

[G-S--7d~ ' 3  d  5 [GT-d",XTS--]} 
ww  ww 

-  TR{6Gd[GT-d",XTS--l  +  [G- S--Xd" 1 ] d6G) 


ww 

2TR{6GdlGT-d_,XTS--)) 


ww 


(46a) 

(46b) 

(46c) 


The  first  variation  of  Eq  (46c)  will  be  zero  if  Eq  (47)  holds. 

d(k)[GT(k)-d',(k)XT(k)S--]  -  0  (*7a) 
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(47b) 


d(k)GT(k)  -  XT(k) S--  -  0 

ww 

G(k)  -  d'1 (k)S--X(k) 


(47c) 


Now  substituting  Eq  (43)  back  into  Eq  (47c),  we  have  the  gain 
term  G(k)  which  corresponds  to  the  minimum  variance  filter. 


G<k)  “  _ _  x(k) 

%e(4)  ♦  XT(k)S..X(k) 


(48a) 


S-- 
Gee  «w 


(48b) 


1  +  X(k) 


r-  Sww  *<k> 
ee 


Comparing  Eq  (48b)  with  Eq  (34),  we  recognize  that  the  Information 
matrix  B(k)  which  realizes  the  minimum  variance  weights  while 
minimizing  the  cost  function  of  Eq  (27)  Is  simply  the  weight 
covariance  matrix  scaled  by  the  reciprocal  of  the  error  power 
of  the  filter.  This  is  Indicated  in  Eq  (49). 

e(k>  -  R^7SiS(k+"  <*»> 

We  must  now  construct  the  weight  covariance  matrix  S--(k+l). 

This  Is  accomplished  by  substituting  the  minimum  variance 
information  matrix  G(k)  back  into  Eq  (4l).  This  is  done  in  Eq  (SO). 

S--(k+l)  -  S--(k)  ♦  G(k)d(k)GT(k) 

-  G(k)XT(k)S--(k)  -  S--(k)X(k)GT(k)  (50a) 
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0 


S--(k+1)  -  S--(k)  ♦  d’,(k)S-5(k)X(k)d(k)d‘,(k)XT(k)S55(k) 

-  d",(k)S--(k)X(k)XT(k)Si-(k) 

-  d',(k)S--(k)X{k)XT{k)S--(k) 


(50b) 


S--(k+l)  -  S--(k)  - 


S--(k)X(k)XT(k)S--(k) 

Ree(k)+xT(k)s--(k)x(k) 


(50c) 


If  we  rewrite  Eq  (50c)  as  Eq  (50d)  and  again  apply  the  inversion 
lemma  with  the  substitutions  indicated  In  Eq  (50e)  we  will 
have  the  expression  shown  in  Eq  (51) • 

s--(k+D  -  s--(k)  -  S2-(k)T(k)tR  JU*xT(k)s--(k)x(k)rV(k)$  (k) 


Ai"1  -  s..(k) 


(50d) 


(50e) 


XT(k) 


Aa  ■  Ree (k) 

[Ax  +  AiaA2^A2i]  '  ■  Ai  '  -  Aj  ^AulAa  +  AaiAt^Au]  *A2iAj* 


*«"<♦'>  -  >ssi(k)  +  r-HtT  s-Ckj^ck)!-’  (su) 

or  -  s:!(k)  .  s-^-xwxV)  (51b) 

Equation  (51)  is  not  as  convenient  for  computational  work  as  is 
Eq  (50).  Equation  (51)  does  however  offer  insight  into  the 
behavior  of  the  covariance  matrix  S--(k+l)  as  the  algorithm 
Iterates.  A  matrix  signal  flow  graph  corresponding  to  Eq  (51) 
is  shown  in  Figure  6. 
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FIGURE  6.  SIGNAL  FLOW  GRAPH  INVERSE  COVARIANCE  MATRIX 

UPDATE  ALGORITHM  [ EQ (5 ) ) 1 


We  can  surmize  the  behavior  of  Eq  (50  by  examining  the  structure 
of  Figure  6.  We  see  that  the  inverse  covariance  matrix  S-i(k+l) 
is  simply  the  output  of  a  matrix  integrator.  As  the  iteration 
progresses,  the  diagonal  terms  of  the  matrix  Increases  without 
bound,  hence  the  inverse  matrix  S._(k+1)  approaches  zero.  We 
can  trace  this  effect  back  to  the  cost  function  of  Eq  (29)  from 
which  we  constructed  this  algorithm.  We  see  that  as  the  inverse 
covariance  matrix  becomes  arbitrarily  large  with  large  time 
index,  the  cost  function  penalizes  changes  in  the  weight  vector. 
In  a  sense,  because  the  covariance  matrix  S--(k)  is  approaching 
zero,  the  algorithm  is  convinced  that  the  weights  are  correct 
and  should  not  be  changed.  The  filter  reflects  this  position 
by  driving  the  feedfoward  gain  [Eq  (48)]  to  zero  effectively 
preventing  changes  in  the  weight  vector. 
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VII.B  Fading  Memory  Filter. 


In  tne  previous  section  we  selected  the  Information  matrix 
B(k)  to  be  equal  to  the  scaled  version  of  the  weight  covariance 
matrix.  We  saw  that  the  covariance  matrix  was  constructed 
iteratively  by  a  matrix  integrator  as  shown  In  Figure  6  end 
described  below  In  Eq  (52). 


B(k+l)  -  S  —  ( k+ 1 )  - 
ww 


x  ( t )  x T  ( i 


(52a) 


We  alluded  to  a  potential  problem  with  this  selection  of  the 
Information  matrix.  Namely,  that  the  filter  turns  Itself  off 
as  the  information  matrix  goes  towards  zero.  We  now  consider  a 
varaiation  of  the  Information  matrix  which  prevents  the  filter 
from  shutting  down.  We  take  a  hint  from  Eq  (52)  and  Figure  6 
and  define  an  exponentially  weighted  summation  in  Eq  (53). 


B(k+1) 


P--(k+l) 

ww 


X(?)XT(i) 


(k-i) 


(53a) 


X(i)XT(l 

Ree(,T 


(k-o  .  x(k)xT(k) 
q  ♦  — 5 — rrr~ 


(53b) 


_  LV  Xd)XT(l)  (k-  1  -  i )  X  (k)  XT  (k)  1"  ’  x 

I q  2^  r  nrr  q  +  ~"TkT  {53c) 

i  ee  ee 


Or  P--(kel)  • 
ww 


I,  p:i 

I  M  WW 


(k)  ♦ 


X(k)X 


T(k)  1 

irrj 


(53d) 


We  recognize  that  Eq  (53d)  describes  an  integrator  with  other 
than  unity  feedback.  If  the  scalar  "q"  is  less  than  unity,  the 
integrator  is  called  a  leaky  integrator.  A  signal  flow  graph  of 
Eq  (53d)  Is  shown  In  Figure  7. 


X(k)XT(k) 


X(k) 


P--  (k+1 ) 
ww 


P 1 1  ( k) 

ww 


OUTER 

PRODUCT 


y— y-| 


RnTr> 


DELAY 

)  * 

- 1 

X 

- 

FIGURE  7.  SIGNAL  FLOW  GRAPH  OF  LEAKY  INTEGRATOR 
REPRESENTATION  OF  INFORMATION  MATRIX 


Note  that  the  leaky  integrator's  output  does  not  increase 
without  bound  as  does  the  unity  gain  integrator  but  is  limited 
to  a  steady  state  gain  of  (1-q)  *.  Thus  the  gain  of  the  algorithm 
does  not  approach  zero,  but  rather  to  small  values  proportional 
to  (1-q).  We  refer  to  the  parameter  "q"  as  the  fade  factor  of 
the  algorithm. 

If  we  apply  the  inversion  lemma  to  Eq  (53)  with  the 


following  substitutions  we  obtain  the  equivalent  form  of 
information  matrix  update  as  indicated  In  Eq  ( 5 *» )  • 

the 

Ai 

-1 

-  q  P--(k) 

^  ww 

(54a) 

Aj2 

-  X(k) 

(54b) 

Aj  i 

y 

-  XT(k) 

(54c) 

A2 

( 5  4  d ) 

IAi+AijAj 

*Aj 1}  '  ■  A j ' *A j ' A j j tAj+Aj |A| ' A 1 1 ] A j 1 Ai ' 

<54e) 

fS8(kt') 

(54f) 

%#(k)  .♦  XT(k)ql’p--(k)X(k) 
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Or 


(5<*g) 


P--(k+l) 

MW 
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P-- (k) 
ww 


P.-(k)X(k)XT(k)P--(k) 

qR#e(k)+XT(k)P-;;(k)X(k) 


Comparing  Eg  (54g)  to  Eq  (SO)  and  Figure  7  to  Figure  6,  we 
of  course  recognize  that  as  the  parameter  "q"  goes  to  unity, 
the  leaky  Information  matrix  becomes  the  minimum  variance 
Information  matrix. 


V  I  I . C  Steepest  Descent  Filter. 

Now  suppose  we  select  the  Information  matrix  B(k)  to  be 
K  times  the  identity  matrix  [K  I]  where  K  Is  a  large  Scalar. 

Thus  B  ^  (  k )  •  j-  1  .  This  -matrix  has  the  effect  in  the  penalty 
function  Eq  (27)  of  not  penalizing  changes  in  the  weight  vector. 
This  Is  equivalent  to  allowing  the  weight  vector  to  make  large 
changes  at  each  iteration.  Substituting  this  B(k)  into  Eq  (33c) 
we  obtain  Eq  (55). 

W(k+1)  -  W(k)  +  - T- K  - -  X(k)  [v(k)-XT(k)W(k)l  (55a) 

1+X  (k)KI  X(k) 


-  W(k)  ♦  - = - \ - - -  X(k)  [v(k)-XT(k)W(k)l  (55b) 

K*'  +  X1  (k)X(k) 

We  note  that  as  K  becomes  large  so  that  4-  becomes  small  with 

— T  —  ^ 

respect  to  the  Inner  product  X  (k)X(k),  Eq  (55b)  approaches 
Eq  (56). 

W ( k+ 1 )  -  W(k)  +  -r - -  X(k)  [v(k)-XT(k)W(k)l  (56) 

XT(k)X(k) 

We  recognize  Eq  (56)  Is  equivalent  to  the  steepest  descent 
IMS  algorithm  of  Eq  (22).  Note  that  this  filter  results  from 
the  penalty  function  Interpreting  the  mean  square  error  as 
arising  from  errors  In  the  weight  vector.  This  is  equivalent 
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to  interpreting  the  scaler  K  as  R  * (k)  the  power  In  the  prediction 

©  © 

error  of  the  optimal  vector  weights.  Then  the  Information  matrix 
B(k)  ■  K  I  becomes  &e^(k)  I*  If  *  I*  large,  the  interpretat ion 
Is  that  the  optimal  prediction  error  Is  small  hence  the  filter 
errors  are  indeed  due  to  an  Incorrect  weight  vector. 

VI  I .0.  LMS  Fi iter. 

Now  suppose  we  select  the  information  matrix  B(k)  to  be 
U  times  the  identity  matrix  ly  I]  where  p  Is  a  small  scalar. 

Thus  B  '(k)  ■  I .  This  has  the  effect  In  the  penalty  function 
Eq  (27)  of  severely  penalizing  changes  in  the  weight  vector  or 
equivalently  of  preventing  the  weight  vector  from  having  large 
changes  at  each  Iteration.  Substituting  this  B(k)  into  Eq  (33c) 
we  obtain  Eq  (57) . 

W(k+1)  -  U(k)  ♦  - — -  X(k)[v(k)-XT(k)W(k)]  (57a) 

I  +  X1  (k)yl  7(k) 

-  W(k)  +  - ^ -  X(k)  [v(k)-XT(k)W(k)]  (57b) 

1  +  yx‘ (k)X(k) 

We  note  that  as  y  becomes  sufficiently  small  to  force  the  product 
y7T(k)J(k)  to  be  small  compared  to  unity,  Eq  (57)  approaches 
Eq  (58). 

W(k+1)  -  W(k)  +  yX(k) Iv(k)-XT(k)W(k)]  (58) 

We  recognize  Eq  (58)  Is  equivalent  to  Widrow's  LHS  algorithm. 

Note  that  this  filter  results  from  the  penalty  function  Interpreting 

the  mean  square  error  es  arizing  from  prediction  noise.  This  is 

equivalent  to  interpreting  the  scalar  y  as  the  power  In 

the  prediction  error  for  the  optimal  vector  weights.  Then  the 

Information  matrix  B(k)  ■  y  I  becomes  *  !(k)  I.  If  the  scalar  y 

©© 

Is  small  the  Interpretation  Is  that  the  optimal  weight  prediction 
error  Is  large.  That  Is  very  noisy  data!  Thus  changes  in  the 
weight  vector  will  have  little  effect  In  reducing  the  error. 


To  reiterate  this  section,  we  have  found  the  optimal  linear 
combiner  to  be  of  the  form  shown  in  Figure  8.  and  in  Eq  (59). 


FIGURE  8.  SIGNAL  FLOW  GRAPH  OF  OPTIMAL  LINEAR  COMBINER 

W(k+l)  -  W(k)  +  G (k)  X’(k)(v(k)-XT(k)W(k)  ]  (59a) 

8  (k) 

Where  G(k)  -  - - -  (59b) 

1  ♦  X1 (k)B(k)X(k) 

The  B(k)'s  of  Interset  to  us  are  listed  below; 

1.  8 (k)  •  R  Sj-(k)  (Minimum  Variance]  (59c) 

©  c* 

Where  S-tk+l)  -  l$jl(k)  ♦  y  ■1-{k)  XT(k)X(k)]  ’  (59d) 

©  © 

2.  "  T1  f k)'  piiw{k)  (P«dlng  Memory]  (59e) 

-  1  1  _T  _  “1 

Where  P—tk+l)  •  (q  P~(k)  ♦  iTTiTr  X  (k)X (k)  1  (59f) 

©  © 
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3.  B  ( k. )  •  K  I,  K>>1  I«  Steepest  Descent] 


(59s) 


k.  B(k)  -  y  I,  y«l  («  LMS]  (59H) 

MX..  STEADY  STATE  BEHAVIOR  OF  OPTIMAL  LINEAR  COMBINER: 

We  win  now  examine  the  steady  state  behavior  of  the 
linear  combiner.  We  start  by  a  form  of  the  algorithm  for  the 
weight  vector  update  which  emphasizes  the  weight  error  vector. 

The  weight  error  vector  Is  defined  as  the  difference  between 
the  optimal  weight  vector  W  and  the  present  weight  vector  W(k+l). 


We  can  rewrite  Eq  (59a)  as  Eq  (60). 

W(M-I)  -  W(k)  +  AW ( k+ 1 )  (60a) 

Where  AW(k+1)  -  G(k)  X  (k)  [v  (k) -XT  (k)  W  (k)  ]  (60b) 

We  define  the  weight  error  vector  or  the  misalignment  vector 
h(k+l)  In  Eq  (61). 

h(k+l)  -  W *  -  W(k+1)  (61) 

We  can  manipulate  Eq  (60)  and  substitute  Eq  (61)  to  obtain  the 
results  shown  in  Eq  (62). 

AW (k+ 1 )  -  W(k+1)  -  W (k)  (62a) 

-  (W*-h(k+l)]  -  [W*-h(k)]  (62b) 

•  -lh(k+l)  -  "h ( k)  1  (62c) 

-  -Ah(k+l)  (62d) 


Thus  the  change  In  the  weight  vector  Is  the  negative  of  the 
change  in  the  misalignment  vector.  This  is  shown  In  Eq  63). 
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Ah(k+l)  -  -AV(k+1) 


(63*) 


-  -6(k)X(k){v(k)-XT(k)[V*-h(k))}  (63b) 

-  G(k)X(k){XT(k)W*-v(k)-XT(k)h(k))  (63c) 

We  can  substitute  Eq  (38)  repeated  here  as  Eq  (6b)  Into  Eq  (63c) 
and  obtain  Eq  (65)  which  Is  the  difference  equation  for  the 
misalignment  vector. 

XT(k)tf*  -  e (k)  -  v(k)  (6ba) 

Hence  Ah(k+1)  -  G  (k)  X  (k)  le  (k) -3TT  (k)  h(k)  J  (6bb) 

Thus  h(k+l)  -  h(k)  ♦  G (k)X (k) (e (k) -XT  (k) h  (k) ]  (65)  . 

A  signal  flow  graph  representing  Eq  (65)  Is  presented  In  Figure  9- 


h(k+l)  h(k) 


FIGURE  9.  SIGNAL  FLOW  GRAPH  FOR  MISALIGNMENT  VECTOR 


Let  us  examtne  the  steady  state  behavior  of  the  misalignment 
vector.  In  steady  state,  the  expected  value  of  the  change  In 
misalignment  is  zero.  See  Eq  (66). 

E(6h(k+I)}  -  0  (66a) 

l{G(k)X(k)(e(k)-XT(k)h(k)])  -  0  (66b) 

E{6(k)X(k)e(k)}  -  E{G(k)7(k)7T(k)h(k))  -  0  (66c) 
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From  which  we  conclude  the  following; 


E{h(k)}  -  E{G(k)X(k)XT(k)}",E{G{k)X(lO}E{e(k))  (67a) 

Or  E{h(k)}  -  0  (67b) 

Thus  we  find  that  the  expected  misalignment  Is  zero.  This  is 

true  for  any  gain  matrix  providing  of  course  the  filter  Is 

stable.  We  have  anticipated  this  result  earlier  when  we 

_  _* 
demonstrated  that  W(k)  Is  an  unbiased  estimate  of  W,  the 

optimal  weight  vector. 

iV.  CONVERGENCE  PROPERTIES  OF  ALGORITHM. 

Now  let  us  consider  the  trans format  1  on  of  variables  which 
diagonalizes  the  matrix  G (k) X (k) XT (k) .  Let  the  new  coordinate 
system  be  defined  by  Eq  (68). 

h(k)  -  Qh' (k)  (68a) 

Then  Qh’  (k+1)  -  Oh’ (k)+G  (k)  X  (k)e  (k)  ~G  (k  )X  (k )  XT  (k)  Qh  '  (k)  (68a) 

Or  h'(k+1)  -  h'(k):+  Q" ’g (k)X (k)e (k) 

-  Q",G(k)X(k)XT(k)Qh' (k)  (68c) 

Now  select  the  matrix  Q  to  be  the  transformation  which 
diagonalizes  G (k) )T(k) X^ (k) .  Then  the  diagonalized  or  modal  form 
of  Eq  (68)  is  shown  in  Eq  (69).  The  signal  61ow  graph  corresponding 
to  Eq  (69)  Is  presented  in  Figure  10. 

F'  (k«-1)  -  F' (k)  +  Q*1G(k)X(k)e(k)  -  A(k)h'(k)  (69a) 

-  (I  -  A(k) ]F'  (k)  ♦  Q*,6(k)?(k)e(k)  (69b) 
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Ah' (k+1) 


IT*  (k+1 


F'(k) 


h(k) 


FIGURE  10.  SIGNAL  FLOW  GRAPH  FOR  MISALIGNMENT  VECTOR 
IN  DIAGONALIZED  COORDINATE  SYSTEM. 

We  note  that  in  the  modal  form  of  the  difference  equation, 
each  coordinate  (or  component)  of  the  misalignment  vector 
operates  independently.  A  signal  flow  graph  of  the  m-th 
component  in  the  modal  form  of  the  difference  equation  is 
shown  i n  F igure  1  1  . 


FIGURE  11.  SIGNAL  FLOW  GRAPH  FOR  M-TH  MODAL 

COORDINATE  OF  M I SADJUSTMENT  VECTOR 


The  difference  equation  corresponding  to  the  separate  modal 
coordinates  Is  shown  In  Eq  (70). 


h'(k+l)  -  (I  -  X  (k)}h'(k)  ♦  gjk)e(k)  (70) 

m  iti  in  in 

Stability  of  the  modal  coordinates  is  assured  if  the  homogeneous 
system  Eq  (71)  Is  non  increasing. 

h'(k+l)  -  (1  -  X  (k)]h'(k)  (7D 

m  in 

The  homogeneous  system  Is  non  increasing  if  the  magnitude  of  the 
contraction  constant  Kffl(k),  defined  in  Eq  (72),  is  bounded  by 
unity. 

iKm(k)l  “  I*  ‘  *m<k>l  i  1  (72) 

It  now  remains  to  examine  the  eigenvalues  of  G  (k)  7(k)  77  (k)  , 
which  are  identical  with  the  eigenvalues  of  *  G  (k) 7(k) 7T  (k)  Q. 
This  Is  show  In  expanded  form  in  Eq  (73) • 

B(k)X(k)XT(k)  (73) 

1  +  XT(k)B(k)X(k) 

As  shown,  Eq  (73)  is  a  dyadic  operator.  It  is  a  singular  matrix 


with  eigenvalues  identified  in  Eq  (7k). 

.  (74.) 

I  4  XT(k)8(k)X(k)  c 

.  TR(B(k)7(k)XT(k))  „  _ ]__  (?4b) 

I  4  TR(B(k)7(k)XT(k))  I4C] 

*2  ’  S  -  *4  -••••-»«*  0  <«‘> 

Then  for  each  Iteration,  the  contraction  constant  K  (k)  of 
Eq  (72)  it  simply  that  shown  In  Eq  (75). 

K  Ik)  -  II  -  X(k)J  (75a) 

in  m 

»  — . ;  m  •  1  (75b) 

1  4  TRlB(k)X(k)IT(k) 

-  I  ;  m  i*  1  (75c) 
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Thus  as  a  dyadic  operator,  the  contraction  constant  Is  strictly 
non  Increasing  in  all  coordlnatas  tf  Eq  (76)  holds  true. 

TR[B<k)X(k)XT(k))  »  0  (76) 

Since  B  ( k )  X*  ( k )  XT  (k)  has  only  one  non  zero  eigenvalue,  and  since 
TR [B  (k) X (k) XT  (k) )  Is  equal  to  the  sum  of  the  eigenvalues,  then 
Eq  (76)  Is  equivalent  to  the  requirement  that  the  single  o 
eigenvalue  by  non  negative.  Or  equivalently  to  requiring 
B (k) X (k) X^ (k)  be  positive  semidef I n I te .  This  In  turn  Is  equivalent 
to  requiring  that  <XT (k) , B (k) X (k)>0 .  From  this  we  conclude  that 
B(k)  must  be  positive  definite.  On  the  other  hand.  If  we 
require  at  each  iteration  of  the  algorithm,  that  the  contraction 
constant  be  strictly  less  than  unity,  then  B(k)  must  be  positive 
definite. 

Now  rather  than  pursue  the  time  varying  active  direction 
of  the  algorithm,  we  will  examine  the  average  of  the  active 
directions.  We  do  so  by  applying  the  expectation  operator  to 
Eq  (73)  thus  converting  from  a  dyadic  operator  to  a  full  rank 
operator.  This  Is  indicated  In  Eq  (77). 


E  (  B(k)X(k)XT(k) 

ll  +  XT(k)B(k)7(k) 


This  can  be  expanded  In  an  approximate  form  as  indicated  in 
Eq  (77b). 

£{B(k)X(k)XT(k)>  ( 

E{  1  +  XT(k)B(k)X(k)> 


Assuming  that  the  matrix  In  Eq  (77)  has  full  rank,  then  the 
diagonalized  form  of  E{6(k)l((k)7T(k)}  has  eigenvaluas  Cj 
through  C^.  The  scaled  eigenvalues  of  Eq  (77b)  are  shown  in 


Eq  (78) 


TR[X7(k)B(k)X(k)] 


(78.) 


N 

r  e, 

n*  I 


(78b) 


The  homogeneous  difference  equation  for  the  average 
ml sadj ustment  modal  system  is  shown  in  Eq  (79). 

h'(k+l)  -  Ci  -  X J  h ' (k) 

in  m  m 


(79a) 


(79b) 


The  contraction  constant  K  (k)  for  this  system  Is  non  Increasing 

m 


if  Eq  (80)  holds  true. 

|K  (k)|  -  fi  - 


m 

eJ 


N 

„?,c" 

n^m 

N 


<  1 


(80a) 


<  I 


(80b) 


TR(XT(k)B(k)X(k)]  -  Cf 
TR(XT(k)B(k)X(k)I 


(80c) 
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A  sufficient  condition  for  Eq  (80)  to  hold  true  is  that  the 
eigenvalues  Cm  satisfy  Eq  ( 8 1 ) . 


C  >  0 
m  — 


(81) 


Equation  (81)  is  equivalent  to  requiring  that  the  matrix 
_ ▼ 

If  the  matrix 


E{B(k)X(k)jfT(k)>  be  positive  semi  definite 


E{B  (k)7(k)  irT  (k) }  is  of  full  rank,  (we  assume  it  is)  then  Eq  (81) 
must  be  satisfied  with  strict  inequality  which  means  the  matrix 
E{B  ( k )  IT  ( k )  IT T  (k)  }  is  of  course  positive  semi  def  i  n  I  te . 


X.  RATES  OF  CONVERGENCE  OF  ALGORITHM. 


s,  the 


Let  us  now  examine  the  manner  in  which  the  C 

eigenvalues  of  E{B (k) X (k) X  (k) }  ,  effect  the  transient  behavior 

of  the  adaptive  filters.  Let  us  first  consider  the  case  in  which 

all  of  the  eigenvalues  C  are  all  approximately  the  same  value 

m 

or  sire.  Then  the  contraction  constants  K  (k)  are  approximately 

rn 


of  the  form  shown  in  Eq  (82). 

E  'J  ’  ‘ 


K  (k)  = 
m 


E‘„ 

n 


m 


(82a) 


N  - 1 
N 


1  '  sr 


(82b) 


Ve  note  that  N  is  the  dimension  of  the  non  recursive  fitter  and 
that  the  average  transient  time  evolves  over  the  time  Index  "k" 
according  to  Eq  (83) • 


Ah  •  (k)  -  Ah' (0 )  [K  ] 
m  mm 


(83) 


The  relative  chenge  in  e  given  weight  misalignment  coordinate 
that  can  occur  In  a  single  Iteration  r*  simply; 


Ah'(k+1)  -  Ah'(k) 
m  m 

Ah;(k) 


(8ka) 
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(84b) 


[k J(k+1)  -IK  ]k 
(K  ]k 

m 


-  [K  -  11 

m 


(84c) 


The  relative  change  In  the  weight  vector  per  iteration  is 
of  course  proportional  to  the  slope  of  the  transient.  Greater 
slopes  imply  faster  transients,  and  smaller  slopes  Imply  slower 
transients.  Substituting  Eq  (82b)  Into  Eq  (84c),  we  find  the 
relative  change  per  Iteration  to  be; 

Relative  change  ■  (85) 


If  the  transient  were  to  continue  at  this  slope,  the  transient 
terms  would  go  to  tero  In  an  interval  of  precisely  N  iteration. 
This  interval  is  the  equivalent  time  constant  of  the  Iteration 
on  the  weight  vector.  In  fact  the  transient  essentially  runs 
for  four  time  constants.  Note  that  the  time  constant  for  the 
equal  size  eigenvalue  problem  is  the  same  as  the  filter  length. 
Thus  If  the  information  matrix  can  hold  the  eigenvalues  to 
near  the  same  values,  the  transient  will  run  for  approximately 
four  filter  lengths.  We  also  conclude  that  longer  filters,  while 
exhibiting  smaller  mean  square  error,  have  longer  transient 
I nterva 1 s , 

Now  let  us  consider  the  case  where  there  is  a  large  spread 
of  the  eigenvalues  C  .  In  particular,  let  us  consider  the  case 

ffl 

for  which  all  but  one  of  the  Eigenvalues  Is  of  the  same  size. 

The  one  exception  Is  either  larger  C _  or  smaller  C  . _  by  a 

factor  0.  Then  the  contraction  constant  K  (k)  is  of  the  form 

m 

shown  I n  Eq  (86) . 


Km(B) 


C 


max/m  In 


n 


I(N-!)+Sl  -  6 
t  (N-D+S) 


(86a) 


l («-!)♦$] 
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(86b) 


t(n-i)+ei  -  i 


(86c) 


Km(l ) 


1 

.  i -  ( 86 d ) 

[ (N-l )+g] 

If 

The  equivalent  time  constants  are  approximately  +  1]  for  the 
exceptional  eigenvalue  and  approximately  (N+g)  for  the  remaining 
eigenvalues.  If  $  is  much  smaller  than  unity.  Indicating  a 
single  small  eigenvalue,  the  corresponding  time  constant  becomes 
very  large.  The  remaining  time  constants  are  essentially  unchanged. 
On  the  other  hand,  if  6  is  significantly  larger  than  unity, 
indicating  a  single  large  eigenvalue,  the  corresponding  time 
constant  becomes  slightly  smaller.  The  remaining  time  constants 
increase  from  approximately  "N"  to  "N+8".  Note  that  either  way, 
if  there  is  an  exceptionally  large  or  an  except  1 ona 1 1 y  small 
eigenvalue,  there  Is  an  increase  In  at  least  one  time  constant. 

If  only  one  time  constant  increases,  the  total  convergence  time 
must  increase.  We  note  that  any  mix  of  small  and  large 
eigenvalues  would  lead  to  the  same  conclusion.  Thus  If  there  is 
a  large  spread  of  eigenvalues,  the  algorithm  will  exhibit  long 
transient  times.  The  function  of  the  minimum  variance  or  fading 
memory  information  matrices  is  to  force  nearly  constant  eigen¬ 
values  and  thus  ensure  short  convergence  times.  Note  that  the 
convergence  rates  are  dependent  upon  the  ratio  of  the  eigenvalues 
and  not  their  absolute  sizes. 
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